Boundedness of iterates in Q-Learning
نویسنده
چکیده
Reinforcement Learning (RL) is a simulation-based counterpart of stochastic dynamic programming. In recent years, it has been used in solving complex Markov decision problems (MDPs). Watkins’ Q-Learning is by far the most popular RL algorithm used for solving discounted-reward MDPs. The boundedness of the iterates in Q-Learning plays a critical role in its convergence analysis and in making the algorithm stable, which makes it extremely attractive in numerical solutions. Previous results show boundedness asymptotically in an almost sure sense. We present a new result that shows boundedness in an absolute sense under some weaker conditions for the step size. Also, our proof is based on some simple induction arguments.
منابع مشابه
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other ...
متن کاملStochastic Shortest Path Games and Q-Learning
We consider a class of two-player zero-sum stochastic games with finite state and compact control spaces, which we call stochastic shortest path (SSP) games. They are total cost stochastic dynamic games that have a cost-free termination state. Based on their close connection to singleplayer SSP problems, we introduce model conditions that characterize a general subclass of these games that have...
متن کاملQ-Learning Algorithms with Random Truncation Bounds and Applications to Effective Parallel Computing
Motivated by an important problem of load balancing in parallel computing, this paper examines a modified algorithm to enhance Q-learning methods, especially in asynchronous recursive procedures for self-adaptive load distribution at runtime. Unlike the existing projection method that utilizes a fixed region, our algorithm employs a sequence of growing truncation bounds to ensure the boundednes...
متن کاملStochastic approximation for non-expansive maps : application to Q-learning algorithms
We discuss synchronous and asynchronous iterations of the form x = x + γ(k)(h(x) + w), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s lemma for the synchronous case or on...
متن کاملStochastic Approximation for Non-Expansive Maps:1 Application to Q-Learning Algorithms
We discuss synchronous and asynchronous variants of fixed point iterations of the form xk+1 = xk + γ(k) ( F (xk, ξk)− xk ) , where F is a non-expansive mapping under a suitable norm, and {ξk} is a stochastic sequence. These are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s Lemma for the synchronous case or Borkar’s Theorem fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systems & Control Letters
دوره 55 شماره
صفحات -
تاریخ انتشار 2006